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1 INTRODUCTION 

1.1 Motivation 

An increasing number of applications are now hosted on the cloud. Some examples are streaming 
(NetFlix, YouTube), storage (Dropbox, Google Drive) and computing (Amazon EC2, Microsoft Azure) 
services. A major advantage of cloud computing and storage is that the large-scale sharing of resources 
provides scalability and flexibility. However, an adverse effect of the sharing of resources is the vari¬ 
ability in the latency experienced by the user due to queueing, virtualization, server outages etc. The 
problem becomes further aggravated when the computing job has several parallel tasks, because the 
slowest task becomes the bottleneck in job completion. Thus, ensuring fast and seamless service is a 
challenging problem in cloud systems. 

One method to reduce latency that has gained significant attention in recent years is the use of 
redundancy. In cloud computing, replicating a task on multiple machines and waiting for the earliest 
copy to finish can significantly reduce the latency [Dean and Barroso 2013]. Similarly, in cloud storage 
systems, requests to access a content can be assigned to multiple replicas, such that it is only sufficient 
to download one replica. However, redundancy can result in increased use of resources such as com¬ 
puting time, and network bandwidth. In frameworks such as Amazon EC2 and Microsoft Azure which 
offer computing as a service, the computing time spent on a job is proportional to the cost of renting 
the machines. 

1 .2 Organization of this Work 

In this work we aim to understand the trade-off between latency and computing cost, and propose 
efficient strategies to add redundancy. We focus on a redundancy model called the (n, k) fork-join model, 
where a job is forked into n tasks such that completion of any k tasks is sufficient to finish the job. In 
Section 2 we formally define this model and its variants. Section 3 summarizes related previous work 
and our contributions. Section 4 gives the key preliminary concepts used in this work. 

The rest of the paper studies different variants of the (n, k) fork-join model in increasing order of 
generality, as shown in Table I. In Section 5 and Section 6 we focus on the k = 1 (replicated) case. Sec¬ 
tion 5 considers full replication of a job at all n servers, and compares different strategies of canceling 
redundant tasks. In Section 6 we consider partial replication at r out of n servers. 
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a Schlumberger Faculty for the Future Fellowship. This work was presented in part at the Allerton Conference on Commu¬ 
nication, Control and Computing 2015, and ACM Sigmetrics Mathematical Modeling and Analysis Workshop 2015. Authors’ 
email addresses: Gauri Joshi: gaurij@andrew.cmu.edu (this author was at MIT at the time of this work); Emina Soljanin: em- 
ina.soljanin@rutgers.edu; Gregory W. Wornell: gww@mit.edu 
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Table 1. : Organization of main latency-cost analysis results presented in the rest of the paper. We fork each job into tasks at 
all n servers (full forking), or to some subset r out of n servers (partial forking). A job is complete when any k of its tasks are 
served. 



k = 1 (Replicated) Case 

General k 

Full forking to all n servers 

Section 5 

Comparison of strategies with and 
without early task cancellation 

Section 7 

Bounds on latency and cost, and the 
diversity-parallelism trade-off 

Partial forking to r out of n 

servers 

Section 6 

Effect of r and the choice of servers 
on latency and cost 

Section 8 

General redundancy strategy for 
cost-efficient latency reduction 


In Section 7 and Section 8, we move to the general k case, which requires a significantly different 
style of analysis than the k = I case. In Section 7 we consider full forking to all n servers, and deter¬ 
mine bounds on latency and cost, generalizing some of the fundamental work on fork-join queues. For 
partial forking, we propose a general redundancy strategy in Section 8. System designers looking for a 
practical redundancy strategy rather than theoretical analysis may skip ahead to Section 8 after the 
problem setup in Section 2. 

Finally, Section 9 summarizes the results and provides future perspectives. Properties and examples 
of log-concavity are given in Appendix A. Proofs of the k = 1 and general k cases are deferred to 
Appendix B and Appendix C respectively. 

2 SYSTEM MODEL 

2.1 Fork-Join Model and its Variants 

Definition 1 ((n, k) fork-join system). Consider a distributed system with n statistically iden¬ 
tical servers. Jobs arrive to the system at rate \, according to a Poisson process^. Each job is forked into n 
tasks that join first-come first-served queues at each of the n servers. The job is said to be complete when 
any k tasks are served. At this instant, all remaining tasks are canceled and abandon their respective 
queues immediately. 

After a task of the job reaches the head of its queue, the time taken to serve it can be random due to 
various factors such as disk seek time and sharing of computing resources between multiple processes. 
We model this service time by a random variable X > 0, with cumulative distribution function (CDF) 
Fx{x). The tail distribution (inverse CDF) of X is denoted by Fx{x) = Pr(X > x). We use X^-.n to 
denote the smallest of n i.i.d. random variables Xi,X 2 ,..., X^. 

We assume that the service time X is i.i.d. across tasks and servers. Thus, if a task is replicated 
at two different servers, the service times of the replicas are independent and identically distributed. 
Dependence of service time on the task itself can be modeled by adding a constant A to X. More 
generally, A may be a random variable. Although we do not consider this case here, the results in this 
paper (particularly Section 5) can be extended to consider correlated service times. 

Fig. 1 illustrates the (3,2) fork-join system. The job exits the system when any 2 out of 3 tasks are 
complete. The k = I case corresponds to a replicated system where a job is sent to all n servers and 
we wait for one of the replicas to be served. The (n, k) fork-join system with k > I can serve as a 
useful model to study content access latency from an (n, k) erasure coded distributed storage system. 


^The Poisson assumption is required only for the exact analysis and bounds on latency (equations (5), (7), (8), (11), (16), (17), 
and (24)). All other results on E [C], and comparison of replication strategies in heavy traffic hold for any arrival process. 
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Fig. 1: The (3, 2) fork-join system. When any 2 out of 3 tasks 
of a job are served (as seen for Job A on the right), the third 
task abandons its queue and the job exits the system. 
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Fig. 2: The (3, 2) fork-early-cancel system. When any 2 out of 
3 tasks of a job are in service, the third task abandons (seen 
for Job A on the left, and Job B on the right). 


Approximate computing applications that require only a fraction of tasks of a job to be complete can 
also be modeled using the (n, k) fork-join system. 

We consider the following two variants of this system, which could save the amount of redundant 
time spent by the servers of each job. 

(1) {n,k) fork-early-cancel system: Instead of waiting for k tasks to finish, the redundant tasks are 
canceled when any k tasks reach the heads of their queues and start service. If more than k tasks 
start service simultaneously, we retain any k chosen uniformly at random. Fig. 2 illustrates the 
(3,2) fork-early-cancel system. In Section 5 we compare the (n, k) systems with and without early 
cancellation. 

(2) (n, r, k) partial fork-join system: Each incoming job is forked into r > k out of the n servers. When 
any k tasks finish service, the redundant tasks are canceled immediately and the job exits the 
system. The r servers can be chosen according to different scheduling policies such as random, 
round-robin, least-work-left (see [Harchol-Balter 2013, Chapter 24] for definitions) etc. In Section 6 
we develop insights into the best choice of r, and the scheduling policy. 

Other variants of the fork-join system include a combination of partial forking and early cancellation, 
or delaying invocation of some of the redundant tasks. Although not studied in detail here, our analysis 
techniques can be extended to these variants. In Section 8 we propose a general redundancy strategy 
that is a combination of partial forking and early cancellation. 

2.2 Latency and Cost Metrics 

We now define the metrics of the latency and resource cost whose trade-off is analyzed in the rest of 
the paper. 

Definition 2 (Latency). The latency T is defined as the time from the arrival of a job until it is 
served. In other words, it is the response time experienced by the job. 

In this paper we focus on analyzing the expected latency E [T]. Although E [T] is a good indicator of 
the average behavior, system designers are often interested in the tail Pr(T > t) of the latency. For 
many queueing problems, determining the distribution of response time T requires the assumption of 
exponential service time. In order to consider arbitrary, non-exponential service time distribution Fx, 
we settle for analyzing E [T] here. 

Definition 3 (Computing Cost). The computing cost C is the total time spent by the servers serv¬ 
ing a job, not including the time spent in the queue. 
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In computing-as-a-service frameworks, the computing cost is proportional to money spent on renting 
machines to run a job on the cloud^. 

3 PREVIOUS WORK AND MAIN CONTRIBUTIONS 
3.1 Related Previous Work 

Systems Work : The use of redundancy to reduce latency is not new. One of the earliest instances is the 
use of multiple routing paths [Maxemchuk 1975] to send packets in networks; see [Kabatiansky et al. 
2005, Chapter 7] for a detailed survey of other related work. A similar idea has been studied [Vulimiri 
et al. 2013] in the context of DNS queries. In large-scale cloud computing frameworks, several recent 
works in systems [Dean and Ghemawat 2008; Ananthanarayanan et al. 2013; Ousterhout et al. 2013] 
explore straggler mitigation techniques where redundant replicas of straggling tasks are launched to 
reduce latency Although the use of redundancy has been explored in systems literature, there is little 
work on the rigorous analysis of how it affects latency, and in particular the cost of resources. Next we 
review some of that work. 

Exponential Service Time: The (n, k) fork-join system was first proposed in [Joshi et al. 2012; 
Joshi et al. 2014] to analyze content download latency from erasure coded distributed storage. These 
works consider that a content file coded into n chunks can be recovered by accessing any k out of the n 
chunks, where the service time X of each chunk is exponential. Even with the exponential assumption 
analyzing the {n,k) fork-join system is a hard problem. It is a generalization of the (n, n) fork-join 
system, which was actively studied in queueing literature [Flatto and Hahn 1984; Nelson and Tantawi 
1988; Varki et al. 2008] around two decades ago. 

Recently, an analysis of latency with heterogeneous job classes for the replicated (k = 1) case with 
distributed queues is presented in [Gardner et al. 2015]. Other related works include [Shah et al. 2014; 
Kumar et al. 2014; Xiang et al. 2014; Kadhe et al. 2015]. A common thread in all these works is that 
they also assume exponential service time. 

General Service Time: Few practical systems have exponentially distributed service time. For ex¬ 
ample, studies of download time traces from Amazon S3 [Liang and Kozat 2014; Chen et al. 2014] 
indicate that the service time is not exponential in practice, but instead a shifted exponential. For ser¬ 
vice time distributions that are 'new-worse-than-used’ [Gao and Wang 1991], it is shown in [Koole and 
Righter 2008] that it is optimal to replicate a job at all servers in the system. The choice of schedul¬ 
ing policy for new-worse-than-used (NWU) and new-better-than-used (NBU) distributions is studied 
in [Kim et al. 2009; Shah et al. 2013; Sun et al. 2015]. The NBU and NWU notions are closely related 
to the log-concavity of service time studied in this work. 

The Cost of Redundancy: If we assume exponential service time then redundancy does not cause 
any increase in cost of server time. But since this is not true in practice, it is important to determine the 
cost of using redundancy. Simulation results with non-zero fixed cost of removal of redundant requests 
are presented in [Shah et al. 2013]. The expected computing cost E [C] spent per job was previously 
considered in [Wang et al. 2014; Wang et al. 2015] for a distributed system without considering queue¬ 
ing of requests. In [Joshi et al. 2015] we presented an analysis of the latency and cost of the (n, k) 
fork-join with and without early cancellation of redundant tasks. 


^Although we focus on this cost metric, we note that redundancy also results in a network cost of making Remote-Procedure 
Calls (RPCs) made to assign tasks of a job, and cancel redundant tasks. It is proportional to the number of servers each job is 
forked to, which is n for the (n, k) fork-join model described above. In the context of distributed storage, redundancy also results 
in increased use of storage space, proportional to n/k. The trade-off between delay and storage is studied in [Joshi et al. 2012; 
Joshi et al. 2014]. 



Efficient Redundancy Techniques for Latency Reduction in Cloud Systems 


1:5 


Table 11. : Latency-optimal and cost-optimal redundancy strategies for the k = 1 (replicated) case. ‘Canceling redundancy early’ 
means that instead of waiting for any 1 task to finish, we cancel redundant tasks as soon as any 1 task begins service. 



Log-concave service time 

Log-convex service time 


Latency-optimal 

Cost-optimal 

Latency-optimal 

Cost-optimal 

Cancel redundancy 
early or keep it? 

Low load: Keep Redundancy, 
High load: Cancel early 

Cancel early 

Keep Redundancy 

Keep Redundancy 

Partial forking to r 
out of n servers 

Low load: r = n (fork to all). 
High load: r = 1 (fork to one) 

r = 1 

r = n (fork to all) 

r = n (fork to all) 


3.2 Main Contributions 

The main differences between this and previous works are: 1) we consider a general service time 
distribution, instead of exponential service time and, 2) we analyze the impact of redundancy on the 
latency, as well as the computing cost (total server time spent per job). Incidentally, our computing cost 
metric E [C] also serves as a powerful tool to compare different redundancy strategies under high load. 

The latency-cost analysis of the fork-join system and its variants gives us the insight that the log- 
concavity (and respectively, the log-convexity) of Fx, the tail distribution of service time, is a key factor 
in choosing the redundancy strategy Here are some examples, which are also summarized in Table 11. 

—By comparing the (n, 1) systems (fork to n, wait for any 1) with and without early cancellation, we 
can show that early cancellation of redundancy can reduce both latency and cost for log-concave Fx, 
but it is not effective for log-convex Fx- 

—For the (n, r, 1) partial-fork-join system (fork to r out of n, wait for any 1), we can show that forking 
to more servers (larger r) is both latency and cost optimal for log-convex Fx - But for log-concave Fx, 
larger r reduces latency only in the low traffic regime, and always increases the computing cost. 

Using these insights we also develop a general redundancy strategy to decide how many servers to 
fork to, and when to cancel the redundant tasks, for an arbitrary service time that may be neither 
log-concave nor log-convex. 

4 PRELIMINARY CONCEPTS 

We now present some preliminary concepts that are vital to understanding the results presented in 
the rest of the paper. 

4.1 Using E [C] to Compare Systems 

Since the cost metric E [C] is the expected time spent by servers on each job, higher E [C] implies 
higher expected waiting time for subsequent jobs. Thus, E [C] can be used to compare the latency with 
different redundancy policies in the heavy traffic regime. In particular, we compare policies that are 
symmetric across the servers, defined formally as follows. 

Definition 4 (Symmetric Policy). With a symmetric scheduling policy, the tasks of each job are 
forked to one or more servers such that the expected task arrival rate is equal across all the servers. 

Most commonly used policies: random, round-robin, join the shortest queue (JSQ) etc. are symmetric 
across the n servers. In Lemma 1, we express the stability region of the system in terms of E [C]. 

Lemma 1 (Stability Region in terms of E [(7]). A system ofn servers with a symmetric redun¬ 
dancy policy is stable, that is, the mean response time F[T] < oo, only if the arrival rate A (with any 
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arrival process) satisfies 


\< 


n 


Thus, the maximum arrival rate that can be supported is Xmax 
redundancy scheduling policy. 


( 1 ) 

n/E [C], where E [C] depends on the 


Proof of Lemma 1. For a symmetric policy the mean time spent by each server per job is E [C] /n. 
Thus the server utilization is p = AE [C] /n. By the server utilization version of Little’s Law, p must be 
less than 1 for the system to be stable. The result follows from this. □ 

Definition 5 (Service Capacity The service capacity of the system A^^^ is the maximum 

achievable Xmax over all symmetric policies. 


From Lemma 1 and Definition 5 we can infer Corollary 1 below. 

Corollary 1. The redundancy strategy that minimizes E [C] results in the lowest E [T] in the heavy 
traffic regime (X ^ 

Note that as A approaches A^^^, the expected latency E [T] ^ oo for all strategies whose Xmax < Knax' 


4.2 Log-concavity of Fx 

If we fork a job to all r idle servers and wait for any 1 copy to finish, the expected computing cost 
E [C] = rE[Xi:^], where Xi^r = min(Xi, X 2 ,..., X^), the minimum of r i.i.d. realizations of random 
variable X. The behavior of this cost function depends on whether the tail distribution Fx of service 
time is log-concave’ or log-convex’. Log-concavity of Fx is defined formally as follows. 

Definition 6 (Log-concavity and log-convexity of F^). The tail distribution Fx is said to 
be log-concave (log-convex) //*logPr(X > x) is concave (convex) in x for all x G [0, 00 ). 

For brevity, when we say X is log-concave (log-convex) in this paper, we mean that Fx is log-concave 
(log-convex). Lemma 2 below gives how rE [Xi;^] varies with r for log-concave (log-convex) Fx. 

Lemma 2 (Expected Minimum). If X is log-concave (log-convex), then rE [Xi.^] is non-decreasing 
(non-increasing) in r. 

The proof of Lemma 2 can be found in Appendix A. Note that the exponential distribution is both 
log-concave and log-convex, and thus rE [Xi;^] remains constant as r varies. This can also be seen from 
the fact that when X ^ Exp{p), an exponential with rate p, Xi:r is an exponential with rate rp. Then, 
rE [Xi:^] = 1/p, a constant independent of r. 

Log-concave and log-convex distributions have been studied in economics and reliability theory and 
have many interesting properties. Properties relevant to this work are given in Appendix A. We refer 
readers to [Bagnoli and Bergstrom 2005]. In Remark 1 we highlight one key property that provides 
intuitive understanding of log-concavity. 

Remark 1. It is well-known that the exponential distribution is memoryless. Log-concave distribu¬ 
tions have 'optimistic memory', that is, the expected remaining service time of a task decreases with the 
time elapsed. On the other hand, log-convex distributions have 'pessimistic memory'. 

Distributions with optimistic memory are referred to as 'new-better-than-used’ [Koole and Righter 
2008], light-everywhere’ [Shah et al. 2013], or 'new-longer-than-used’ [Sun et al. 2015]. Log-concavity 
of X implies that X is 'new-better-than-used’ (see Property 3 in Appendix A for the proof). 
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A natural question is: what are examples of log-concave and log-convex distributions that arise 
in practice? A canonical example of a log-concave distribution is the shifted exponential distribution 
ShifteclExp{A, /i), which is exponential with rate /i, plus a constant shift A > 0, is log-concave. Recent 
work [Liang and Kozat 2014; Chen et al. 2014] on analysis of content download from Amazon S3 
observed that X is shifted exponential, where A is proportional to the size of the content and the 
exponential part is the random delay in starting the data transfer. Another example of log-concave 
service time is the uniform distribution over any convex set. 

Log-convex service times occur when there is high variability in service time. CPU service times 
are often approximated by the hyperexponential distribution, which is a mixture of two or more ex¬ 
ponentials. In this paper we focus on mixtures of two exponentials with decay rates fii and /i 2 respec¬ 
tively, where the exponential with rate /ii occurs with probability p. We denote this distribution by 
HyperExp{iii, ii2,p). If a server is generally fast (rate pi) but it can slow down (rate p2 < Pi) with 
probability I — p, then the overall service time distribution would be X ^ HyperExp{pi, p2,p). 

Many practical systems also have service times that are neither log-concave nor log-convex. In this 
paper we use the Pareto distribution Pareto{xm, as an example of such distributions. Its tail distri¬ 
bution is given by. 


Pr(X > x) 



otherwise. 


( 2 ) 


The tail distribution in (2) is log-convex for x > Xm, but not for all x > 0 due the initial delay of 
Thus, overall the Pareto distribution is neither log-concave, nor log-convex. 

Remark 2. Log-concave (log-convex) distributions are reminiscent of another well-known class of 
distributions: light (heavy) tailed distributions. Many random variables with log-concave (log-convex) 
Fx are light (heavy) tailed respectively, but neither property implies the other. For example, the Pareto 
distribution defined above is heavy tailed but is neither log-concave, nor log-convex. While the tail of a 
distribution characterizes how the maximum E [X^-.n] behaves for large n, log-concavity (log-convexity) 
of Fx characterizes the behavior of the minimum E [Xi:n\ which is of primary interest in this work. 


4.3 Relative Task Start Times 

Since the tasks of the job experience different waiting times in their respective queues, they start being 
served at different times. The relative start times of the n tasks of a job is an important factor affecting 
the latency and cost. We denote the relative start times by ti < ^2 < • < where ti = 0 without loss 

of generality. For instance, if n = 3 tasks start at absolute times 3, 4 and 7, then their relative start 
times are = 0, ^2 = 4 — 3 = 1 and ts = 7 — 3 = 4. In the case of partial forking when only r tasks are 
invoked, we can consider to be oo. 

For the replicated case {k = 1), let S be the time from when the earliest replica of a task starts 
service, until any one replica finishes. It is the minimum of Xi + ti, X 2 + ^ 2 , • , where X^ are 

i.i.d. with distribution Fx- The tail distributon Pr(5' > s) are given by, 

n 

Pv{S>s) = Y[Pr{X>s-tn). (3) 


The computing cost C can be expressed in terms of S and ti as follows. 


( 4 ) 
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M/G/1 Queue 


mHHQi = 


mmai 


Abandon 


Fig. 3: Equivalence of the (n, 1) fork-join system with an M/G/1 queue with service time Xi^n, the minimum of n i.i.d. random 
variables Xi, X 2 ,..., Xn- 


Using (4) we get several crucial insights in the rest of the paper. For instance, in Section 6 we show 
that when Fx is log-convex, having = ^2 = • • • = = 0 gives the lowest E [C]. Then using Lemma 1 

we can infer that it is optimal to fork a job to all n servers when Fx is log-convex. 


5 k = l CASE WITHOUT AND WITH EARLY CANCELLATION 

In this section we analyze the latency and cost of the (n, 1) fork-join system, and the (n, 1) fork-early- 
cancel system defined in Section 2. We get the insight that it is better to cancel redundant tasks early 
if Fx is log-concave. On the other hand, if Fx is log-convex, retaining the redundant tasks is better. 


5.1 Latency-Cost Analysis 

Theorem 1. The expected latency and computing cost of an (n, 1) fork-join system are given by 


E [T] = E 




E [C] = n . E [Xi:,] 


E [Xi:,] + 


AE [XlJ 
2(l-AE[Xi^„]) 


(5) 

( 6 ) 


where Xi:^ = min(Xi, X 2 ,..., X^) for i.i.d. Xi ^ Fx- 

Proof. Consider the first job that arrives to a (n, 1) fork-join system when all servers are idle. The 
n tasks of this job start service simultaneously at their respective servers. The earliest task finishes 
after time Xi:^, and all other tasks are canceled immediately. So, the tasks of all subsequent jobs 
arriving to the system also start simultaneously at the n servers as illustrated in Fig. 3. Hence, arrival 
and departure events, and the latency of an (n, 1) fork-join system is equivalent in distribution to an 
M/G/1 queue with service time Xi;^. 

The expected latency of an M/G/1 queue is given by the Pollaczek-Khinchine formula (5). The ex¬ 
pected cost E [G] = nE [Xi:^] because each of the n servers spends Xi^n time on the job. This can also 
be seen by noting that S = Xi-^ when U = 0 for all i, and thus by (4), G = nXi-^. □ 


From (5) it is easy to see that for any service time distribution Fx, the expected latency E [T] is 
non-increasing with n. The behavior of E [G] follows from Lemma 2 as given by Corollary 2 below. 

Corollary 2. If Fx is log-concave (log-convex), then E [G] is non-decreasing (non-increasing) in n. 

Fig. 4 and Fig. 5 show analytical plots of the expected latency versus cost for log-concave and log- 
convex Fx respectively. In Fig. 4, the arrival rate A = 0.25, and X is shifted exponential ShiftedExp{A, 0.5), 
with different values of A. For A > 0 , there is a trade-off between expected latency and cost. Only 
when A = 0, that is, X is a pure exponential (which is generally not true in practice), we can re¬ 
duce latency without any additional cost. In Fig. 5, arrival rate A = 0.5, and X is hyperexponential 
HyperExp{0.4:, 0.5, ^ 2 ) with different values of 112 - We get a simultaneous reduction in E [T] and E [G] as 
n increases. The cost reduction is steeper as 112 increases. 
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Fig. 4: The service time X ~ A + Exp(/i) (log-concave), with 
fi = 0.5, A = 0.25. As n increases along each curve, E [T] 
decreases and E [C] increases. Only when A = 0, latency re¬ 
duces at no additional cost. 



Fig. 5: The service time X ~ HyperExp{0A, /j,i, /j, 2 ) (log- 
convex), with i^i = 0.5, different values of 112 , and A = 0.5. 
Expected latency and cost both reduce as n increases along 
each curve. 


Instead of holding the arrival rate A constant, if we consider that it scales linearly with n, then the 
latency E [T] may not always decrease with n. In Corollary 3 we study the behavior as n varies. 


Corollary 3. If the arrival rate A = Aon, scaling linearly with n, then the latency E [T] decreases 
with n if Fx is log-convex. If Fx is log-concave then E [T] increase with n in heavy traffic. 


Proof. If A = Aon, then latency E [T] in (5) can be rewritten as 


E [T] = E [Xi:,] + 


ApnE J 
2(l-AonE [Xi:,]) 


(7) 


If Fx is log-convex then by Lemma 2 we know that nE [Xi:n] decreases with n. Similarly, nE [Xf^] 
also decreases with n (the proof follows similarly as Lemma 2. Hence, we can conclude that the latency 
in (7) decreases with n for log-convex Fx. On the other hand, if Fx is log-concave, then nE [Xi:n] and 
nE [Xf^] increase with n. Thus, in the heavy traffic regime (A ^ ^max)^ when the second term in (7) 
dominates, E [T] increases with n. □ 


5.2 Early Task Cancellation 

We now analyze the (n, 1) fork-early-cancel system, where we cancel redundant tasks as soon as any 
task reaches the head of its queue. Intuitively, early cancellation can save computing cost, but the 
latency could increase due to the loss of diversity advantage provided by retaining redundant tasks. 
Comparing it to (n, 1) fork-join system, we gain the insight that early cancellation is better when Fx 
is log-concave, but ineffective for log-convex Fx. 


Theorem 2. The expected latency and cost of the (n, 1) fork-early-cancel system are given by 


E [T] = E 


'rj^MlGIn 


E [C] = E [X], 


( 8 ) 

( 9 ) 


where IS the response time of an XI j G jin (queueing system ivith service time X^ Fx. 
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M/G/3 Queue 

BEET 

Central Queue 


Choose first 
idle server 

Fig. 6: Equivalence of the (n, 1) fork-early cancel system to an MlGjn queue with each server taking time X ~ Fx to serve 
task, i.i.d. across servers and tasks. 

Proof. In the (n, 1) fork-early-cancel system, when any one tasks reaches the head of its queue, all 
others are canceled immediately The redundant tasks help find the queue with the least work left, 
and exactly one task of each joh is served by the first server that becomes idle. Thus, as illustrated 
in Fig. 6, the latency of the (n, 1) fork-early-cancel system is equivalent in distribution to an M/G/n 
queue. Hence E [T] = E and E [C] = E [X]. □ 

The exact analysis of mean response time E ^Qj^g been an open problem in queueing 

theory. A well-known approximation given by [Lee and Longton 1959] is. 


E 

'jiMIGIn 

«EV+ L ^E 




2E[X]^ 

- 


where E is the expected waiting time in an M/M/n queueing system with load p = AE [X] /n. 

This expected waiting time can be evaluated using the Erlang-C model [Harchol-Balter 2013, Chap¬ 
ter 14]. A related work that studies the centralized queue model that the (n, 1) fork-early-cancel system 
is equivalent to is [Visschers et al. 2012], which considers the case of heterogeneous job classes with 
exponential service times. 

Next we compare the latency and cost with and without early cancellation given by Theorem 2 and 
Theorem 1. Corollary 4 below follows from Lemma 2. 

Corollary 4. If Fx is log-concave (log-convex), then E [C] of the (n, 1) fork-early-cancel system is 
greater than or equal to (less than or equal to) that of (n, 1) fork-join system. 

In the low A regime, the (n, 1) fork-join system gives lower E [T] than (n, 1) fork-early-cancel because 
of higher diversity due to redundant tasks. By Corollary 1, in the high A regime, the system with lower 
E [C] has lower expected latency. 

Corollary 5. If Fx is log-concave, early cancellation gives higher E [T] than (n, 1) fork-join when 
A is small, and lower in the high A regime. If Fx is log-convex, then early cancellation gives higher E [T] 
for both low and high A. 

Fig. 7 and Fig. 8 illustrate Corollary 5. Fig. 7 shows a comparison of E [T] with and without early 
cancellation of redundant tasks for the (4,1) system with service time X ^ ShiftedExp{2,0.5). We 
observe that early cancellation gives lower E [T] in the high A regime. In Fig. 8 we observe that when 
X is HyperExp{0.1, 1.5,0.5) which is log-convex, early cancellation is worse for both small and large A. 

In general, early cancellation is better when X is less variable (lower coefficient of variation). For 
example, a comparison of E [T] with (n, 1) fork-join and (n, 1) fork-early-cancel systems as A, the con¬ 
stant shift of service time ShiftedExp{A, /i) varies indicates that early cancellation is better for larger 
A. When A is small, there is more randomness in the service time of a task, and hence keeping the 


o 

Qi 




= 


Abandon 


















Efficient Redundancy Techniques for Latency Reduction in Cloud Systems 


1:11 



1 

1 

(n, 1) fork-join 

30 

1 

♦- (n, 1) fork-early-cancel " 

b- 

1 

1 

1 

1 

1 

g 20 

1 

t 

f 

1 

r 

1 

tU 15 

1 

o 

^ 10 

• 

/ 

♦ 

H 

5 


0.0 0.2 0.4 0.6 0.8 1. 

A, the arrival rate 




(n, 1) fork-join 

♦- 4 (n, 1) fork-early-cancel 


5 - 

0.0 0.5 1.0 1.5 2.0 

A, the arrival rate 


Fig. 7: For the (4,1) system with service time X ~ 
ShiftedExp{2,^.b) which is log-concave, early cancellation is 
better in the high A regime, as given by Corollary 5. 


Fig. 8: For the (4,1) system with X ~ HyperExpif^.l, 1.5, 0.5), 
which is log-convex, early cancellation is worse in both low 
and high A regimes, as given by Corollary 5. 


redundant tasks running gives more diversity and lower E [T ]. But as A increases, task service times 
are more deterministic due to which it is better to cancel the redundant tasks early. 

6 PARTIAL FORKING (k = 1 CASE) 

For applications with a large number of servers n, full forking of jobs to all servers can be expensive in 
terms of the network cost of issuing and canceling the tasks. In this section we analyze the k = 1 case 
of the (n, r, k) fork-join system, where an incoming job is forked to some r out n servers and we wait for 
any 1 task to finish. The r servers are chosen using a symmetric policy (Definition 4). Some examples 
of symmetric policies are: 

(1) Group-based random: This policy holds when r divides n. The n servers are divided into n/r groups 
of r servers each. A job is forked to one of these groups, chosen uniformly at random. 

(2) Uniform Random: A job is forked to any r out of n servers, chosen uniformly at random. 

Fig. 9 illustrates the (4,2,1) partial-fork-join system with the group-based random and the uniform- 
random policies. In the sequel, we develop insights into the best r and the choice of servers for a given 
service time distribution Fx. 

Remark 3 (Relation to PoWER-OF-r Scheduling). Power-of-r scheduling[Mitzenmacher 1996] 
is a well-known policy in multi-server systems. It chooses r out of the n servers at random and assigns 
an incoming task to the shortest queue among them. A major advantage of the power-of-r policy is that 
even with r « n, the latency achieved by it is close to the join-the-shortest queue policy (equivalent to 
power-of-r with r = n). 

The (n, r, 1) partial-fork-join system with uniform random policy also chooses r queues at random. 
However, instead of choosing the shortest queue, it creates replicas of the task at all the queues. The 
replicas help find the queue with the least work left, which gives better load balancing than joining 
the shortest queue. But unlike power-of-r, servers might spend redundant time on replicas that will 
eventually be canceled. 
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groups, chosen 
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Group 2 
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(a) Group-based random 


(b) Uniform random 


Fig. 9: (4, 2,1) partial-fork-join system, where each job is forked to r = 2 servers, chosen according to the group-based random 
or uniform random policies. 


6.1 Latency-Cost Analysis 

In the group-based random policy, the job arrivals are split equally across the groups, and each group 
behaves like an independent (r, 1) fork-join system. Thus, the expected latency and cost follow from 
Theorem 1 as given in Lemma 3 below. 

Lemma 3 (Group-based random). The expected latency and cost when each job is forked to one 
of n/r groups ofr servers each are given by 


EfTl-EfY ll 

( 11 ) 

E [C] = rE [Xi:,.] 

( 12 ) 


Proof. The job arrivals are split equally across the n/r groups, such that the arrival rate to each 
group is a Poisson process with rate \r/n. The r tasks of each job start service at their respective 
servers simultaneously, and thus each group behaves like an independent (r, 1) fork-join system with 
Poisson arrivals at rate \r/n. Hence, the expected latency and cost follow from Theorem 1. □ 

Using (12) and Lemma 1, we can infer that the service capacity (maximum supported A) for an 
(n, r, 1) system with group-based random policy is 

^max ~ r ^ T (13) 

rE 

From (13) we can infer that the r that minimizes rE [Xi-r] results in the highest service capacity, and 
hence the lowest E [T] in the heavy traffic regime. By Lemma 2, the optimal r is r = 1 (r = n) for log- 
concave (log-convex) Fx- For distributions that are neither log-concave nor log-convex, an intermediate 
r may be optimal and we can determine it using Lemma 3. For example. Fig. 10 shows a plot of latency 
versus cost as given by Lemma 3 for n = 12 servers. The task service time X ^ Pareto{l, 2.2). Each job 
is replicated at r servers according to the group-based random policy, with r varying along each curve. 
Initially increasing r reduces the latency, but beyond r*, the replicas cause an increase in the queueing 
delay. This increase in queueing delay is more dominant for higher A. Thus the optimal r* decreases 
as A increases. 

For other symmetric policies, it is difficult to get an exact analysis of E [T] and E [C] because the tasks 
of a job can start at different times. However, we can get bounds on E [C] depending on the log-concavity 
of X, given in Theorem 3 below. 

Theorem 3. Consider an (n, r, 1) partial-fork join system, where a job is forked into tasks at r out 
of n servers chosen according to a symmetric policy. For any relative task start times ti, E [C] can be 
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Fig. 10: Analytical plot of latency versus cost for n = 12 servers. Each job is replicated at r servers chosen by the group-based 
random policy with r increasing as 1, 2, 3, 4, 6, and 12 along each curve. The task service time X ~ Pareto{l, 2.2). As A increases 
the replicas increase queueing delay. Thus the optimal r* that minimizes E [T] shifts downward as A increases. 




▼- -▼ Upper Bound rE[Xi:r] 
•- Group-based Random 
♦ Uniform Random 
■■ ■ ■■ Join r shortest queues 
A- -A Lower Bound E[X] 








5 r - 


k-A-- L -- 


▲- 1 


2 3 4 5 
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Fig. 11: Expected cost E [C] versus r for A ~ ShiftedExp{l, 0.25), n = 6 servers, arrival rate A = 0.5 and different scheduling 
policies. The upper bound rE [Xi:r] is exact for the group-based random policy, and fairly tight for other policies. 


hounded as follows. 

rE [Xi,r\ > E [C] > E [X] if Fx is log-concave (14) 

E[X] > E [C] > rE [Xi:r] if Fx is log-convex (15) 

In the extreme case when r = 1, E [C] = E [X\ and when r = n, E [C] = nE 

To prove Theorem 3 we take expectation in (4), and show that for log-concave and log-convex Fx, we 
get the bounds in (14) and (15), which are independent of the relative task start times ti. The detailed 
proof is given in Appendix B. 

In Fig. 11 we show the bounds given by (14) for log-concave distributions alongside simulation values, 
for different scheduling policies. The service time X ^ ShiftedExp{l,{).2h), and arrival rate A = 0.5. 
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Fig. 12: For X ~ ShiftedExp{l, 0.5) which is log-concave, fork¬ 
ing to less (more) servers reduces expected latency in the low 
(high) A regime. Each job is replicated at r out of n = 6 
servers, chosen by the group-based random policy. 


Fig. 13: For X ~ HyperExp{p, pi, p 2 ) with p = 0.1, pi = 
1.5, and p 2 = 0.5 which is log-convex, larger r gives lower 
expected latency for all A. Each job is replicated at r out of 
n = 6 servers, chosen according to the group-based random 
policy. 


Since all replicas start simultaneously with the group-based random policy, the upper bound E [C] > 
rE [Xi-r] is tight for any r. For other scheduling policies, the bound is more loose for the policy that 
staggers relative start times of replicas to a greater extent. 

6.2 Optimal value of r 

We can use the bounds in Theorem 3 to gain insights into choosing the best r when Fx is log-concave 
or log-convex. In particular, we study two extreme traffic regimes: low traffic (A ^ 0) and heavy traffic 
(A ^ A^^^), where service capacity of the system introduced in Definition 5. 

Corollary 6 (Expected Cost vs. r). For a system of n servers with symmetric forking of each 
job to r servers, r = I (r = n) minimizes the expected cost E [C] when Fx is log-concave (log-convex). 

The proof follows from Lemma 2, rE [Xi,r\ is non-decreasing (non-increasing) with r for log-concave 
(log-convex) Fx- 

Lemma 4 (Expected Latency vs. r). In the low-traffic regime, forking to all servers (r = n) gives 
the lowest E [T] for any service time distribution Fx- In the heavy traffic regime, r = I (r = n) gives 
lowest E [T] if Fx is log-concave (log-convex). 

Proof. In the low traffic regime with A ^ 0, the waiting time in queue tends to zero. Thus all 
replicas of a task start service at the same time, irrespective of the scheduling policy. Then the expected 
latency is E [T] = E [Xi:r], which decreases with r. Thus, r = n gives the lower E [T] for any service time 
distribution Fx- 

By Corollary 1, the optimal replication strategy in heavy traffic is the one that minimizes E [C]. For 
log-convex Fx, r = n achieves the lower bound E [C] = nE [Xi:n] in (15) with equality. Thus, r = n is 
the optimal strategy in the heavy traffic regime. For log-concave Fx, r = 1 achieves the lower bound 
E [(7] = E [X] in (14) with equality. Thus, in heavy traffic, r = 1 gives lowest E [T] for log-concave Fx- □ 

Lemma 4 is illustrated by Fig. 12 and Fig. 13 where E [T] calculated analytically using (11) is plotted 
versus A for different values of r. Each job is assigned to r servers chosen uniformly at random from 
n = 6 servers. In Fig. 12 the service time distribution is ShiftedExp{A, fi) (which is log-concave) with 
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A = 1 and ji = 0.5. When A is small, more redundancy (higher r) gives lower E [T], but in the high 
A regime, r = 1 gives lowest E [T] and highest service capacity. On the other hand in Fig. 13, for a 
log-convex distribution HyperExp{p, /ii, /i 2 ), in the high load regime E [T] decreases as r increases. 

Lemma 4 was previously proven for new-better-than-used (new-worse-than-used) instead of log- 
concave (log-convex) Fx in [Shah et al. 2013; Koole and Righter 2008], using a combinatorial argu¬ 
ment. Using Theorem 3, we get an alternative, and arguably simpler way to prove this result. Note 
that our version is weaker because log-concavity implies new-better-than-used but the converse is not 
true in general (see Property 3 in Appendix A). 

Due to the network cost of issuing and canceling the replicas, there may be an upper limit r < Vmax 
on the number of replicas. The optimal strategy under this constraint is given by Lemma 5 below. 

Lemma 5 (Optimal r under r < Vmax)- For log-convex Fx, r = Vmax Is optimal. For log-concave 
Fx, r = 1 is optimal in heavy traffic. 

The proof is similar to Lemma 4 with n replaced by rmax- 

6.3 Choice of the r servers 

For a given r, we now compare different policies of choosing the r servers for each job. The choice of the 
r servers determines the relative starting times of the tasks. By using the bounds in Theorem 3 that 
hold for any relative task start times we get the following result. 

Lemma 6 (Cost of different policies). Given r, if Fx is log-concave (log-convex), the symmet¬ 
ric policy that results in the tasks starting at the same time = 0 for all 1 < i < r) results in higher 

(lower) E [C] than one that results in {) < ti < oo for one or more i. 

Proof. The symmetric policy that results in = 0 for all 1 < i < r (for eg. the group-based random 
policy) results in E [C] = rE [Xi-r]. By Theorem 3, if Fx is log-concave, E [C] < rE [Xi-r] for any sym¬ 
metric policy. Thus, for log-concave distributions, the symmetric policy that results in 0 < < oo for 

one or more i gives lower E [C] than the group-based random policy. On the other hand, for log-convex 
distributions, E [C] > rE [Xi-r] with any symmetric policy. Thus the policies that result in relative task 
start times U = 0 for all 1 < i < r give lower E [C] than other symmetric policies. □ 

Lemma 7 (Latency in high A regime). Given r, if Fx is log-concave (log-convex), the symmetric 
policy that results in the tasks starting at the same time = 0 for all 1 < i <r) results in higher (lower) 

E [T] in the heavy traffic regime than one that results in {) < ti < oo for some i. 

Proof. By Corollary 1, the optimal replication strategy in heavy traffic is the one that minimizes 
E [C]. Then the proof follows from Lemma 6. □ 

Lemma 7 is illustrated by Fig. 14 and Fig. 15 for n = 6 and r = 3. The simulations are run for 
100 workloads with 1000 jobs each. The r tasks may start at different times with the uniform random 
policy, whereas they always start simultaneously with group-based random policy. Thus, in the high A 
regime, the uniform random policy results in lower latency for log-concave Fx, as observed in Fig. 14. 
But for log-convex Fx, group-based forking is better in the high A regime as seen in Fig. 15. For low A, 
uniform random policy is better for any Fx because it gives lower expected waiting time in queue. 

7 THE GENERAL A: CASE 

We now move to general k case, where a job requires any k out of n tasks to complete. In practice, the 
general k case arises in large-scale parallel computing frameworks such as MapReduce, and in content 
download from coded distributed storage systems. In this section we present bounds on the latency 
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Fig. 14: For service time distribution ShiftedExp{l, 0.5) which 
is log-concave, uniform random scheduling (which staggers 
relative task start times) gives lower E [T] than group-based 
random for all A. The system parameters are n = 6, r = 3. 


Fig. 15: For service time distribution HyperExp{0.1,2.0,0.2) 
which is log-convex, group-based scheduling gives lower E [T] 
than uniform random in the high A regime. The system pa¬ 
rameters are n = 6, r = 3. 


and cost of the (n, k) fork-join and (n, k) fork-early-cancel systems. In Section 7.2 we demonstrate an 
interesting diversity-parallelism trade-off in choosing k. 


7.1 Latency and Cost of the (n, k) fork-join system 

Unlike the k = 1 case, for general k exact analysis is hard because multiple jobs can be in service 
simultaneously (for e.g. Job A and Job B in Fig. 1). Even for the k = n case studied in [Nelson and 
Tantawi 1988; Varki et al. 2008], only bounds on latency are known. We generalize those latency 
bounds to any k, and also provide bounds on cost E [C]. The analysis of E [C] can be used to estimate 
the service capacity using Lemma 1. 


Theorem 4 (Bounds on Latency). The latency E [T] is bounded as follows. 


E [T] < E + 


K:n] 

2(1-AE [Xk..n]y 


E[T] >E[Xfc.„] + 


AE [VL] 

2(1-AE [Vi:„])- 


(16) 

(17) 


The proof is given in Appendix C. In Fig. 16 we plot the bounds on latency alongside the simulation 
values for Pareto service time. The upper bound (16) becomes more loose as k increases, because the 
split-merge system considered to get the upper bound (see proof of Theorem 4) becomes worse as 
compared to the fork-join system. For the special case k = n we can improve the upper bound in 
Lemma 8 below, by generalizing the approach used in [Nelson and Tantawi 1988]. 


Lemma 8 (Tighter Upper bound when k = n). For the case k = n, another upper hound on 
latency is given by, 


E [T] < E [max (i?i, i? 2 , • • • Rn)] , (18) 

where Ri are i.i.d. realizations of the response time R of an MjGjX queue with arrival rate service 
time distribution Fx- 
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The proof is given in Appendix C. Transform analysis [Harchol-Balter 2013, Chapter 25] can be 
used to determine the distribution of R, the response time of an MjGIl queue in terms of Fx{x). The 
Laplace-Stieltjes transform R{s) of the probability density function of /i?(r) of R is given by, 


R{s) 


sX{s) E[X]) 
s-X{l-X{s)) ’ 


(19) 


where X{s) is the Laplace-Stieltjes transform of the service time distribution fx{x). 

The lower bound on latency (17) can be improved for shifted exponential Fx, generalizing the ap¬ 
proach in [Varki et al. 2008] based on the memoryless property of the exponential tail. 


g 3.0 
fl 2.5 


I 2.0 

ID 

^ 1 c 

W 1-5 

1.0 

0.5 


▼ ■ Upper Bound 
•- Simulation 
A A Lower Bound 


.• / 
/ 

.' / 

/ 


/ .• 




123456789 10 

k, the number of servers we need to wait for 


▼ T Upper Bound 
•- Simulation 
A- ■ A Lower Bound 


•T / • 
/ ■ 


..T' 


..T 

■ 


..A’’ 


123456789 10 

k, the number of servers we need to wait for 


Fig. 16: Bounds on latency E [T] versus k (Theorem 4), along¬ 
side simulation values. The service time X ~ Pareto{0.5, 2.5), 
n = 10, and A = 0.5. A tigher upper bound fork = n is evalu¬ 
ated using Lemma 8. 


Fig. 17: Bounds on cost E [C] versus k (Theorem 5) alongside 
simulation values. The service time X ~ Pareto(0.5, 2.5), n = 
10, and A = 0.5. The bounds are tight for A: = 1 and k = n. 


Theorem 5 (Bounds on Cost). The expected computing cost E [C] can be bounded as follows. 

E [C] <{k- 1)E [X] + (n-k + 1)E (20) 

k 

E [C] > ^ E [Xi.,n] + {n- k)E [Xi-.n-k+i] (21) 

i=l 

The proof is given in Appendix C. Fig. 17 shows the bounds alongside the simulation plot of the 
computing cost E [C] when Fx is Pareto{xm, ct) with Xm = 0.5 and a = 2.5. The arrival rate A = 0.5, and 
n = 10 with k varying from 1 to 10 on the x-axis. The simulation is run for 100 iterations of 1000 jobs. 
We observe that the bounds on E [C] are tight for k = 1 and k = n, which can also be inferred from (20) 
and (21). 

7.2 Diversity-Parallelism Trade-off 

In Fig. 16 we observed the expected latency increases with k, because we need to wait for more tasks to 
complete, and the service time X is independent of k. But in most computing and storage applications, 
the service time X decreases as k increases because each task becomes smaller. We refer to this as the 
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k, the number of servers we need to wait for 


Fig. 18: Expected latency versus k for task service time X ~ 
ShiftedExp(A/k, 1.0), and arrival rate A = 0.5. As k in¬ 
creases, we lose diversity but the parallelism benefit is higher 
because each task is smaller. 



k, the number of servers we need to wait for 


Fig. 19: Expected cost versus k for task service time X ~ 
ShiftedExp(A/k, 1.0), and arrival rate A = 0.5. As k in¬ 
creases, we lose diversity but the parallelism benefit is higher 
because each task is smaller. 


‘parallelism benefit’ of splitting a job into more tasks. But as k increases, we lose the ‘diversity benefit’ 
provided by redundant tasks and having to wait only for a subset of the tasks to finish. Thus, there is a 
diversity-parallelism trade-off in choosing the optimal /c* that minimizes latency E [T]. We demonstrate 
the diversity-parallelism trade-off in simulation plot Fig. 18 for service time X ^ ShiftedExp{Xk, ji), 
with /i = 1.0, and A/^ = A/k. As k increases, we lose diversity but the parallelism benefit is higher 
because each task is smaller. As A increases, the optimal /c* shifted upward because the service time 
distribution becomes ‘less random’ and so there is less diversity benefit. 

We can also observe the diversity-parallelism trade-off mathematically in the low traffic regime, for 
X ShiftedExp{A/k, fi). If we take A 0 in (17) and (16), both bounds coincide and we get, 

lim E [T] = E [Xk:n] = T + ~ , (22) 

A—^OO rC JJj 


where Hn = Yl7=i harmonic number. The parallelism benefit comes from the first term 

in (22), which reduces with k. The diversity of waiting for k out of n tasks causes the second term 
to increase with k. The optimal /c* that minimizes (22) strikes a balance between these two opposing 
trends. 

Fig. 19 shows a similar diversity-parallelism trade-off in choosing k to minimize the computing cost 
E [C]. In the heavy traffic regime, by Corollary 1 the policy that minimizes E [C] also minimizes E [T]. 
Thus the same /c* will minimize both E [T] and E [C]. 


7.3 Latency and Cost of the (n, k) fork-early-cancel system 

We now analyze the latency and cost of the (n, k) fork-early-cancel system where the redundant tasks 
are canceled as soon as any k tasks start service. 
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Theorem 6 (Latency-Cost with Early Cancellation). The cost E [C] and an upper bound 
on the expected latency E [T] with early cancellation is given by 

E [C] = m [X] (23) 

E [T] < E [max (i?i, i? 2 , * * * Rk)] (24) 

where Ri are Ltd. realizations of R, the reponse time of an MjGIl queue with arrival rate Xk/n and 
service time distribution Fx- 

The proof is given in Appendix C. The Laplace-Stieltjes transform of the response time of an M/G/1 
queue with service time distribution Fx{x) and arrival rate is same as (19), with A replaced by Xk/n. 

By comparing the cost E [C] = kE [X] in (23) to the bounds in Theorem 5 without early cancellation, 
we can get insights into when early cancellation is effective for a given service time distribution Fx- 
For example, when Fx is log-convex, the upper bound in (20) is smaller than kE [X]. Thus we can infer 
that the (n, k) fork-early-cancel system is always worse than the (n, k) fork-join system when X is 
log-convex. We also observed this phenomenon in Fig. 8 for the k = 1 case. 

8 GENERAL REDUNDANCY STRATEGY 

From the analysis in Section 5 and Section 6, we get insights into designing the best redundancy 
strategy for log-concave and log-convex service time. But it is not obvious to infer the best strategy for 
arbitrary service time distributions, or when only empirical traces of the service time are given. We now 
propose such a redundancy strategy to minimize the latency, subject to computing and network cost 
constraints. This strategy can also be used on traces of task service time when closed-form expressions 
of Fx and its order statistics are not known. 

8.1 Generalized Fork-join Model 

We first introduce a general fork-join variant that is a combination of the partial fork introduced in 
Section 2, and partial early cancellation of redundant tasks. 

Definition 7 ((n, r/, r, k) fork-join system). For a system of n servers and a job that requires k 
tasks to complete, we do the following: 

—Fork the job to Vf out of the n servers chosen uniformly at random. 

—When any r < Vf tasks are at the head of queues or in service already, cancel all other tasks im¬ 
mediately. If more than r tasks start service simultaneously, retain r randomly chosen ones out of 
them. 

—When any k <r tasks finish, cancel all remaining tasks immediately. 

Note k tasks may finish before some r start service, and thus we may not need to perform the partial 
early cancellation in the second step above. 

Recall that the n servers have service time distribution X that is i.i.d. across the servers and tasks. 
The Vf — r tasks that are canceled early, help find the shortest r out of the r/ queues, thus reduc¬ 
ing waiting time. From the r tasks retained, waiting for any k to finish provides diversity and hence 
reduces service time. 

The special cases (n, n, n, k), (n, n, k, k) and (n, r, r, k) correspond to the (n, k) fork-join and (n, k) fork- 
early-cancel and (n, r, k) partial-fork-join systems respectively, which are defined in Section 2. 
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8.2 Choosing Parameters r/ and r 

We propose a strategy to choose r/ and r to minimize expected latency E [T], subject to a computing cost 
constraint is E [C] < 7 , and a network cost constraint is r/ < r^ax- We impose the second constraint 
because forking to more servers results in higher network cost of remote-procedure-calls (RPCs) to 
launch and cancel the tasks. 


Definition 8 (Proposed Redundancy Strategy). Choose vf and r to minimize E [ T ] subject 
to constraints E [C] < 7 and r/ < r^ax- The solutions are 


f —'^maxt 

r* = argmin T{r), s.t. C{r) < 7 

T'G [0, a a; ] 


(25) 

(26) 


where T{r) and C{r) are estimates of the expected latency E [T] and cost E [C], defined as follows: 


f{r)^ E[Xk:r] + 


ArE [Xl,] 
2(n-ArE [Xk-.r])' 


C{r) ^ rE [X,:,]. 


(27) 

(28) 


To justify the strategy above, observe that for a given r, increasing r/ gives higher diversity in 
finding the queues with the least-work-left and thus reduces latency Since Vf — r tasks are canceled 
early before starting service, r/ affects E [C] only mildly, through the relative task start times of r 
tasks that are retained. So we conjecture that it is optimal to set r/ = rmax in (25), the maximum value 
possible under network cost constraints. Changing r on the other hand does affect both the computing 
cost and latency significantly Thus to determine the optimal r, we minimize T(r) subject to constraints 
C{r) < 7 and r < Vmax RS given in (26). 

The estimates T(r) and C{r) are obtained by generalizing Lemma 3 for group-based random forking 
to any k, and r that may not divide n. When the order statistics of Fx are hard to compute, or Fx itself 
is not explicitly known, T(r) and C{r) can be also be found using empirical traces of X. 

The sources of inaccuracy in the estimates T(r) and C{r) are as follows. 

(1) For k > I, the latency estimate T(r) is a generalization of the split-merge queueing upper bound in 
Theorem 4. Since the bound becomes loose as k increases, the error |T(r) — E [T]| increases with k. 

(2) The estimates T(r) and C{r) are by definition independent of r/, which is not true in practice. As 
explained above, for Vf > r, the actual E [T] is generally less than T(r), and E [C] can be slightly 
higher or lower than C{r). 

(3) Since the estimates T(r) and C{r) are based on group-based forking, they consider that all r tasks 
start simultaneously. Variability in relative task start times can result in actual latency and cost 
that are different from the estimates. For example, from Theorem 3 we can infer that when Fx is 
log-concave (log-convex), the actual computing cost E [C] is less than (greater than) C{r). 


The factor (1) above is the largest source of inaccuracy, especially for larger k and A. Since the 
estimate % is an upper bound on the actual latency, the r* and r*j: recommended by the strategy are 
smaller than or equal to their optimal values. Factors ( 2 ) and (3) only affect the relative task start 
times and generally result in a smaller error in estimating E [T] and E [C ]. 


8.3 Simulation Results 

We now present simulation results comparing the proposed strategy given in Definition 8 to the (n, r, k) 
partial-fork-join system with r varying from k to n. The service time distributions considered here are 



Efficient Redundancy Techniques for Latency Reduction in Cloud Systems 


1:21 


(D 

^ 1.6 
I—I 


S 1.4 
a 

W 


• No Redundancy: r/ = r = 1 
" >^ {n,r,l) partial-fork-join for 1 < r < n 

^ Proposed Strategy r* = 3, = 8 




^ ^ ^ - -X- 


4 6 8 

Expected Computing Cost E[C] 


tL 

-2 

^ 0.6 
I—I 


S 0.4 
a 

X 

w 


0.0 - 

1.05 


• No Redundancy: r/ = r = 1 
" X- (n, r, 1) partial-fork-join for 1 < r < n 
^ Proposed Strategy r* = 5, = 5 






_-K- 


1.10 1.15 1.20 1.25 1.30 1.35 

Expected Computing Cost E[C] 


Fig. 20: The latency-cost trade-off of the proposed redundancy 
strategy is close to that of the best (n, r, A:) partial-fork-join 
system. Service time X ~ Pareto{1,2.2), and the cost con¬ 
straints are E [C] <5 and r < rf <8 The first constraint is 
active in this example. 


Fig. 21: The latency-cost trade-off of the proposed redundancy 
strategy is close to that of the best {n,r,k) partial-fork-join 
system. The service time X is an equiprobable mixture of 
Exp{2) and ShiftedExp{l,l.^), and the cost constraints are 
E [C] <2 and r < rf < 5. The second constraint is active in 
this example. 


neither log-concave nor log-convex, thus making it hard to directly infer the best redundancy strategy 
using the analysis presented in the previous sections. The simulations are run for 100 workloads with 
1000 jobs each. 

In Fig. 20 the service time X ^ Pareto{l, 2.2), n = 10, k = 1, and arrival rate A = 0.7. The computing 
and network cost constraints are E[C] <5 and r/ < 8 respectively. We observe that the proposed 
strategy gives a significant latency reduction as compared to the no redundancy case (r = k in the 
(n, r, k) partial-fork-join system). We observe that the proposed strategy gives a latency-cost trade-off 
that is better than the (n, r, k) partial-fork-join system. Using partial early cancellation (r/ > r) gives 
an additional reduction in latency by providing greater diversity and helping us find the r out of r/ 
queues with the least work left. 

In Fig. 21 we show a case where the cost E [C] does not always increase with the amount of re¬ 
dundancy r. The task service time X is a mixture of an exponential Exp{2) and a shifted exponential 
ShiftedExp{l, 1.5), each occurring with equal probability. The other parameters are n = 10, k = 1, and 
arrival rate A = 0.3. The proposed strategy found using Definition 8 is r* = = Vmax = 5, limited by 

the Vf < Vmax constraint rather than the E [C] <7 constraint. Since r/ = r, it coincides exactly with 
the (n, r, k) partial-fork-join system. 

9 CONCLUDING REMARKS 

In this paper we consider a redundancy model where each incoming job is forked to queues at multiple 
servers and we wait for any one replica to finish. We analyze how redundancy affects the latency, 
and the cost of computing time, and demonstrate how the log-concavity of service time is a key factor 
affecting the latency-cost trade-off Some key insights from this analysis are: 

—For log-convex service times, forking to more servers (more redundancy) reduces both latency and 
cost. On the other hand, for log-concave service times, more redundancy can reduce latency only at 
the expense of an increase in cost. 
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—Early cancellation of redundant requests can save both latency and cost for log-concave service time, 
but it is not effective for log-convex service time. 

Using these insights, we also propose a general redundancy strategy for an arbitrary service time 
distribution, that may be neither log-concave nor log-convex. This strategy can also be used on empir¬ 
ical traces of service time, when a closed-form expression of the distribution is not known. 

Ongoing work includes developing online strategies to simultaneously learn the service time dis¬ 
tribution, and the best redundancy strategy. More broadly, the proposed redundancy techniques can 
be used to reduce latency in several applications beyond the realm of cloud storage and computing 
systems, for example crowdsourcing, algorithmic trading, manufacturing etc. 
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APPENDIX 

A LOG-CONCAVITY OF Fx 

In this section we present some properties and examples of log-concave and log-convex random vari¬ 
ables that are relevant to this work. For more properties please see [Bagnoli and Bergstrom 2005]. 

Property 1 (Jensen’s Inequality). If Fx is log-concave, then /or o < < i and for all x,y e 

[0, oo), 

Pr(X >0xF{l- 0)y) > Pr(X > xf Pr(X > y)^-^ . (29) 

The inequality is reversed if Fx is log-convex. 

Proof. Since Fx is log-concave, \ogFx is concave. Taking log on both sides on (29) we get the 
Jensen’s inequality which holds for concave functions. □ 

In past literature saying X is log-concave usually means that / is log-concave. This implies that F and 
F. However log-convex /, does not always imply log-convexity of F and F. 

Property 2 (Scaling). IfFx is log-concave, for^ <0 <l, 

Pr(X >x)< Pr(X > (30) 

The inequality is reversed if Fx is log-convex. 

Proof. We can derive (30) by setting ^ = 0 in (29). 

Pr(X > 6»a; + (1 - 6)0) > Pr{X > x)'^ Pr(X > 0)^-^ (31) 

Pr(X > ex) > Pr{X > x)^. (32) 

To get (32) we observe that if Fx is log-concave, then Pr(X > 0) has to be 1. Otherwise log-concavity 
is violated at x = 0. Raising both sides of (32) to power 1/0 we get (30). The reverse inequality of 
log-convex Fx can be proved similarly. □ 

Property 3 (Sub-MULTIPLICATIVITY). If Fx is log-concave, the conditional tail probability of X 
satisfies for aZZ t, x > 0, 

Pr(X >xFt\X>t)< Pr(X > x) 

^ Pr(X > X Ft) < Pr(X > x) Pr(X > t) 

The inequalities above are reversed if Fx is log-convex. 


(33) 

(34) 
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Proof. 


Pr(X > x) Pr(X > t) (35) 

= Prfx> ^-(x + t)Vrfx> ^-(x + t)V (36) 

\ x + t J \ x + t J 

> Pr(X > X + t) ^ Pr(X > X + t) ^, (37) 

where we apply Property 2 to (36) to get (37). Equation (33) follows from (37). □ 


Note that for exponential Fx which is memoryless, (33) holds with equality. Thus log-concave distri¬ 
butions can be thought to have ‘optimistic memory’, because the conditional tail probability decreases 
over time. On the other hand, log-convex distributions have ‘pessimistic memory’ because the condi¬ 
tional tail probability increases over time. The definition of the notions ‘new-better-than-used’ in [Koole 
and Righter 2008] is same as (33). By Property 3 log-concavity of Fx implies that X is new-better-than- 
used. New-better-than-used distributions are referred to as ‘light-everywhere’ in [Shah et al. 2013] and 
‘new-longer-than-used’ in [Sun et al. 2015]. 

Property 4 (Mean Residual Life). If Fx is log-concave (log-convex), E[X — t\X > t], the mean 
residual life after time t > 0 has elapsed is non-increasing (non-decreasing) in t 

Proof of Lemma 2. Lemma 2 is true for log-concave Fx if rE [Xi-^] < (r + 1)E for all inte¬ 

gers r > 1 . This inequality can be simplified as follows. 


r 


r 


rE [Xi,r] <{r+ 1)E [Xi:r+l] 

nOO nOO 

/ Pr(Xi:^ > x)dx < (r + 1) Pr(Xi:^+i > x)dx, 
Jo Jo 

POO POO 

/ Pt{X > xYdx < / {r + l)Pv{X > xY+'^dx, 
Jo Jo 






r + 1 


r+l 


dxf 


(38) 

(39) 

(40) 

(41) 


We get (39) using the fact that the expected value of a non-negative random variable is equal to the 
integral of its tail distribution. To get (40) observe that since Xi-r = min(Xi,X 2 , • • • ,X^) for i.i.d. Xi, 
we have Pr(Xi.^ > x) = Pr(X > xY for all x > 0. Similarly Pr(Xi.^+i > x) = Pr(X > xY~^^. Next we 
perform a change of variables on both sides of (40) to get (41). 

Now we use Property 2 to compare the two integrands in (41). Setting 0 = r/r F I and x = x' /r in 
Property 2 , we get 

/ \ r+l 

> ) for all x' > 0. (42) 

r + lj 


Pr I X > — 

r 


< Pr X 


Hence, by (42) and the equivalences in (38)-(41) it follows that for log-concave Fx if rE [Xi-^] is non¬ 
decreasing in r. For log-convex Fx, we can show that rE [Xi-^] is non-increasing in r by reversing all 
inequalities above. 


□ 

Property 5 (Hazard Rates). If Fx is log-concave (log-convex), then the hazard rate h{x), which 
is defined by {x)/Fx{x), is non-decreasing (non-increasing) in x. 
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Property 6 (Coefficient of Variation). The coefficient of variation Cy = a/ii is the ratio of 
the standard deviation a and mean /jl of random variable X. For log-concave (log-convex) X, Cy < 1 
(Cy> 1), and Cy = 1 when X is pure exponential. 

Property 7 (Examples of Log-concave Fx). The following distributions have log-concave Fx: 

—Shifted Exponential (Exponential plus constant A > Oj 
—Uniform over any convex set 
— Weibull with shape parameter c > 1 
—Gamma with shape parameter c > 1 
— Chi-squared with degrees of freedom c > 2 

Property 8 (Examples of Log-convex Fx). The following distributions have log-convex Fx: 

—Exponential 

—Hyper Exponential (Mixture of exponentials) 

— Weibull with shape parameter 0 < c < 1 
—Gamma with shape parameter 0 < c < 1 


B PROOFS FOR THE k = 1 Case 

Proof of Theorem 3. Using (4), we can express the cost C in terms of the relative task start times 
ti, and S as follows. Since only r tasks are invoked, the relative start times ..., are equal to oo. 

C = 5 + (5-t2)+ + --- + (5-tr)+, (43) 


where S is the time between the start of service of the earliest task, and when any 1 of the r tasks 
finishes. The tail distribution of S is given by 


PT{S>s) = '[lPv{X>s-ti). 


i=l 


By taking expectation on both sides of (43) and simplifying we get. 


' roo 

E[C] = Y^ / 

U=1 

^ rtu+1 

= ^u Vr{S> s)ds, 

n=l 

^ ptu-pi—tu 

= u / Pr(5' > tuF x)dx^ 

u=i 

r ntu+i-tu « 

= E«/ X Ftu — ti)dx. 


u=l 


i=l 


(44) 


(45) 

(46) 

(47) 


0 


(48) 



Efficient Redundancy Techniques for Latency Reduction in Cloud Systems 


1:25 


We now prove that for log-concave Fx, E [C] > E[X]. The proof that E[C] < E[X] when Fx is log- 
convex follows similarly with all inequalities below reversed. We express the integral in (48) as, 


® [C"] = ^( / IT Pr(X > xFtu- U)dx - IT Pr(X > x F - ti)dx ] , 
n=i yo i=i Jo J 

=E (/” n p' (^ > ^+- '•) n p' (■'' > ^+*-« - ‘.) *') • 
[-’^1+(n ^ ^ “ *‘) “ n P' +*" - *•)) *'■ 


= E 
>E[X], 


(49) 

(50) 

(51) 

(52) 


where in (49) we express each integral in (48) as a difference of two integrals from 0 to oo. In (50) we 
perform a change of variables x = x'/u. In (51) we rearrange the grouping of the terms in the sum; the 
uth negative integral is put in the uFl term of the summation. Then the first term of the summation is 
simply Pr(X > x)dx which is equal to E [X]. In (51) we use the fact that each term in the summation 
in (50) is positive when Fx is log-concave. This is shown in Lemma 9 below. 

Next we prove that for log-concave Fx, E [C] < rE [Xi:r]. Again, the proof of E [C] > rE [Xi:r] when 
Fx is log-convex follows with all the inequalities below reversed. 


E 


^ rtu+i-tu ^ / 

u=l i=l ^ 

r / poo u / 

= E f 

U=1 \J^ i=l ^ 


u{x Ftu- ti) 


dx^ 


x' + u{tu - ti) j / 


dx' 


poo U 

-I n> 

^ 0 2 = 1 


Pr X > 


x' Fu{tu^i , 


Pr(x>—j dx'F^i / J3Pr(x> 


u=2 


'0 


z=l 


x' +u{tu - tj) , , 


dx — 


poo U—1 

I n' 

•^0 i=l 


Pr X > 


x' + {u- l){tu - U) 


dx' 


< rE [Xi,r], 


dx' 


(53) 


(54) 


(55) 

(56) 


where we get (53) by applying Property 2 to (48). In (54) we express the integral as a difference of two 
integrals from 0 to oo, and perform a change of variables x = x'/u. In (55) we rearrange the grouping of 
the terms in the sum; the negative integral is put in the u + 1 term of the summation. The first term 
is equal to rE [Xi.,r\. We use Lemma 10 to show that each term in the summation in (55) is negative 
when Fx is log-concave. □ 

Lemma 9. If Fx is log-concave, 

U / I \ U—1 / ! \ 

llPr (X > ^ + tu - tij > U.Pdx>^^+tu-tiy (57) 

7 — 1 '' 7 — 1 '' 


The inequality is reversed for log-convex Fx- 
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Proof of Lemma 9. We bound the left hand side expression as follows. 

u u 

JJPr (x> =Pr(5>i„) JJPr (x > ^ + , 


(58) 


i=l 


u-l U—1 


Pv{S > U) Pr (X > 0 X JJ Pr (X > ^ + - ti\X >U- , (59) 


i=l 


U—l u 

> Pv{S > i„) n Pr (X > ^ +- ti\X >tu- ti) , 

U-l X X 

> Pr(5' > tu) Pr fx > —^ -\-tu -ti\X > tu - Uj , 


i=l 


i-l 


i=l 


= TT Pr X > -- + 

V u — l 


tqi to 


(60) 

(61) 

(62) 


where we use Property 3 to get (60). The inequality in (61) follows from applying Property 2 to the 
conditional distribution Pr(y > x'/u) = Pr(X > x'/u -^tu — ti\X > tu — ti), which is also log-concave. 
For log-convex Fx all the inequalities can be reversed. □ 


Lemma 10. If Fx is log-concave, 

I /J- J- ^ \ u—l 

X + U[tu —ti)\ ^ TT 


llPr X 


i=l 


> 


< n Pr ( X > 


X + (u- l)(tu - ti) 


(63) 


The inequality is reversed for log-convex Fx- 

Proof of Lemma 10. We start by simplifying the left-hand side expression, raised to the power 

(u — l)/r. 


Pr ( X > 


i=l 




L6-1 U—l 




i=l 


X + u{tu - ti) 


>7) 


i=l 
-1 


X + U{tu - ti) 


< n Pm ^ > 


XF {u- l){tu - ti) 


(64) 

(65) 

( 66 ) 


where (66) follows from the log-concavity of Pr(X > x), and the Jensen’s equality. The inequality is 
reversed for log-convex Fx. □ 


C PROOFS FOR GENERAL k 

Proof of Theorem 4. To find the upper bound on latency, we consider a related queueing system 
called the split-merge queueing system. In the split-merge system all the queues are blocked and 
cannot serve subsequent jobs until k out of n tasks of the current job are complete. Thus the latency 
of the split-merge system serves as an upper bound on that of the fork-join system. In the split-merge 
system we observe that jobs are served one-by-one, and no two jobs are served simultaneously. So it is 
equivalent to an M/G/1 queue with Poisson arrival rate A, and service time Xj^-u. The expected latency 
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of an MjGIl queue is given by the Pollaczek-Khinchine formula [Gallager 2013, Chapter 5], and it 
reduces to the upper bound in (16). 

To find the lower bound we consider a system where the job requires k out of n tasks to complete, 
but all jobs arriving before it require only 1 task to finish. Then the expected waiting time in queue is 
equal to the second term in (16) with k set to 1. Adding the expected service time E [Xi^-n] to this lower 
bound on expected waiting time, we get the lower bound (17) on the expected latency. □ 

Proof of Lemma 8. This upper bound is a generalization of the bound on the mean response time 
of the (n, n) fork-join system with exponential service time presented in [Nelson and Tantawi 1988]. To 
find the bound, we first observe that the response times experienced by the tasks in the n queues form a 
set of associated random variables [Esary et al. 1967]. Then we use the property of associated random 
variables that their expected maximum is less than that for independent variables with the same 
marginal distributions. Unfortunately, this approach cannot be extended to the k < n case because 
this property of associated variables does not hold for the k^^ order statistic for k < n. □ 

Proof of Theorem 5. A key observation used in proving the cost bounds is that at least n — k^l 
out of the n tasks of a job i start service at the same time. This is because when the k^^ task of Job (i — 1) 
finishes, the remaining n — k tasks are canceled immediately. These n — k-^1 queues start working on 
the tasks of Job i at the same time. 

To prove the upper bound we divide the n tasks into two groups, the k — 1 tasks that can start early, 
and the n — k 1 which start at the same time after the last tasks of the previous job are terminated. 
We consider a constraint that all the k — 1 tasks in the first group and 1 of the remaining n — k ^ 1 
tasks needs to be served for completion of the job. This gives an upper bound on the computing cost 
because we are not taking into account the case where more than one tasks from the second group can 
finish service before the k — 1 tasks in the first group. For the n — k tasks in the second group, the 
computing cost is equal to n — k ^ 1 times the time taken for one of them to complete. The computing 
time spent on the first k — 1 tasks is at most {k — 1)E [X]. Adding this to the second group’s cost, we get 
the upper bound (20). 

We observe that the expected computing cost for the k tasks that finish is at least 
which takes into account full diversity of the redundant tasks. Since we need k tasks to complete in 
total, at least 1 of the n — k-\-l tasks that start simultaneously needs to be served. Thus, the computing 
cost of the (n — k) redundant tasks is at least (n — k)E [Xi:n-/c+i]. Adding this to the lower bound on the 
first group’s cost, we get (21). □ 

Proof of Theorem 6. Since exactly k tasks are served, and others are cancelled before they start 
service, it follows that the expected computing cost E [C] = kE [X]. In the sequel, we find an upper 
bound on the latency of the (n, k) fork-early-cancel system. 

First observe that in the (n, k) fork-early-cancel system, the n — k redundant tasks that are canceled 
early help find the k shortest queues. The expected task arrival rate at each server is Xk/n, which 
excludes the redundant tasks that are canceled before they start service. 

Consider an (n, k, k) partial fork system without redundancy, where the k tasks of each job are as¬ 
signed to k out of n queues chosen uniformly at random. The job exits the system when all k tasks are 
complete. The expected task arrival rate at each server is Xk/n, same as the {n,k) fork-early-cancel 
system. However, the (n, k) fork-early-cancel system gives lower latency because having the n — k re¬ 
dundant tasks provides diversity and helps find the k shortest queues. Thus the latency of the (n, k, k) 
partial-fork-join system is bounded below by that of the (n, k) fork-early-cancel system. 

Now let us upper bound the latency E of the partial fork system. Each queue has arrival 

rate Xk/n, and service time distribution Fx- Using the approach in [Nelson and Tantawi 1988] we can 
show that the response times (waiting plus service time) Ri, 1 < i < k ot the k queues serving each 
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job form a set of associated random variables. Then by the property that the expected maximum of k 
associated random variables is less than the expected maximum of k independent variables with the 
same marginal distributions we can show that, 


E [T] < E 






(67) 

( 68 ) 


The expected maximum can be numerically evaluated from distribution of R. From the transform 
analysis given in [Harchol-Balter 2013, Chapter 25], we know that the Laplace-Stieltjes transform 
R{s) of the probability density of R is same as (19), but with A replaced by Xk/n. □ 
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